Take-home Exercise 1: Geospatial Analytics for Public Good

Author

Pham Hung Son

Published

September 14, 2025

Modified

September 15, 2025

1 Overview

Inside a Hawker Center (via TripAdvisor)

The Food & Beverage (F&B) and Hospitality sector is a vital pillar of Singapore’s economy, contributing significantly to both employment and tourism-driven revenue. In 2024, international visitor arrivals reached about 16.5 million, and tourism receipts rose to S$29,781 million (or ~SGD 29.8 billion), a strong recovery to pre-pandemic levels. Among the components of tourism spend, expenditure on Food & Beverage increased by 6% year-on-year in January - September 2024. The F&B services sector also plays a major role domestically: for example, in July 2025, its sales reached S$1.0 billion, with fast food outlets and food caterers showing notable growth, and approximately a quarter of F&B sales happening online.

Beyond raw numbers, the sector deeply shapes everyday life—socializing, culture, lifestyle—through restaurants, hawker centers, cafés, bars, catering, etc. It’s also a key part of the hospitality value chain: hotels not only provide lodging but often generate substantial revenue from associated F&B (restaurants, banquets). The dynamic interplay between F&B manufacturers, drink producers, and service-based outlets influences urban planning (where clusters of eating and dining pop up), business location decisions, land-use, and labour markets.

2 Getting Started

2.1 The Packages

In this exercise, we will use following packages:

Package Description
sf Provides functions to manage, processing, and manipulate Simple Features, a formal geospatial data standard that specifies a storage and access model of spatial geometries such as points, lines, and polygons.
spatstat Provides functions for spatial statistics with a strong focus on analysing spatial point patterns.
raster Provides functions which reads, writes, manipulates, analyses and model of gridded spatial data (i.e. raster). In this hands-on exercise, it will be used to convert image output generate by spatstat into raster format.
tidyverse Provides collection of functions for performing data science task such as importing, tidying, wrangling data and visualising data.
tmap Provides functions for plotting cartographic quality static point patterns maps or interactive maps by using leaflet API
httr Provide a wrapper for the curl package, customised to the demands of modern web APIs. Key features: Functions for the most important http verbs: GET()HEAD()PATCH()PUT()DELETE() and POST().
dplyr Provides a set of verbs that help solving the most common data manipulation challenges.
lubridate Handle date and date-time data type.
pacman::p_load(tidyverse, sf, tmap, httr)

2.2 The Data

2.2.1 Business Categories Overview

Using the Singapore Standard Industrial Classification (SSIC) 2020, three business categories were selected for analysis: Food and Beverage Service Activities (Division 56), Manufacture of Other Food Products (Group 107), and Manufacture of Beverages (Group 110). Together, they span the spectrum from food production and processing to retail consumption, providing a comprehensive view of how food-related businesses emerge and evolve across the city. Examining their spatial and temporal patterns allows us to understand not only where F&B establishments are located, but also how they interact with upstream manufacturers and beverage producers. This holistic perspective offers valuable insights into the clustering of economic activities, the shaping of commercial districts, and the role of the F&B ecosystem in supporting urban development and community life.

Below is the details of the business type selected:

56 – Food and Beverage Service Activities

  • Division 56: Food and beverage service activities
    • Includes: Restaurants, cafés, fast food outlets, food courts, coffee shops, hawker centers, caterers, pubs, bars, nightclubs, and other drinking places.

    • Covers businesses primarily engaged in preparing meals, snacks, and drinks for immediate consumption.

107 – Manufacture of Other Food Products

  • Division 10: Manufacture of food products

  • Group 107: Manufacture of other food products

    • Includes:

      • 1071 – Manufacture of bakery products (bread, biscuits, cakes, confectionery)

      • 1074 – Manufacture of macaroni, noodles, vermicelli and related products

      • 1075 – Manufacture of prepared meals and dishes (e.g., frozen dinners)

      • 1076 – Manufacture of coffee, tea and related products

      • 1079 – Manufacture of other food products n.e.c. (sauces, soya products, spices, chips, crackers, titbits)

110 – Manufacture of Beverages

  • Division 11: Manufacture of beverages

  • Group 110: Manufacture of beverages

    • Includes:

      • 1101 – Distilling, rectifying and blending of spirits

      • 1102 – Manufacture of wines

      • 1103 – Manufacture of malt liquors and malt (beer, stout)

      • 1104 – Manufacture of soft drinks, bottled/mineral waters, cordials, syrups, composite concentrates, and ice.

2.2.2 Datasets

Dataset Name Description Format Source
ACRA (Accounting and Corporate Regulatory Authority) Information on Corporate Entities Datasets of registered entities in Singapore. (We will select business entities registered between 1st January 2024 to 30th June 2025.) CSV data.gov.sg
SSIC: Singapore Standard Industrial Classification (SSIC) Official system for classifying economic activity of business establishments in Singapore. XLSX SSIC
Singapore Master Plan 2019 (URA) Zoning Data Singapore’s land-use zones: residential, commercial, mixed-use, industrial. GEOJSON data.gov.sg
LTA MRT Station Exit Geospatial data for Singapore’s MRT/LRT stations exit. GEOJSON data.gov.sg

2.3 Data Importing & Wrangling

2.3.1 Importing ACRA data

The ACRA Corporate Entities CSV files were imported into R using tidyverse. The first 24 columns were retained, covering entity identifiers, names, incorporation dates, SSIC codes, and addresses. This ensured consistency across all files as we are dropping the unnecessary fields such as fields with excessive missing values or fields with only single values.

folder_path <- "data/aspatial/ACRA"
file_list <- list.files(path = folder_path,
                        pattern = "^ACRA*.*\\.csv$",
                        full.names = TRUE)

acra_data <- file_list %>%
  map_dfr(read_csv)
glimpse(acra_data)
Rows: 2,026,935
Columns: 53
$ uen                               <chr> "00022100K", "00031800X", "00043100A…
$ issuance_agency_id                <chr> "ACRA", "ACRA", "ACRA", "ACRA", "ACR…
$ entity_name                       <chr> "A Y ABDUL RAHIMAN", "A M ABDULLAH S…
$ entity_type_description           <chr> "Sole Proprietorship/ Partnership", …
$ business_constitution_description <chr> "Partnership", "Partnership", "Partn…
$ company_type_description          <chr> "na", "na", "na", "na", "na", "na", …
$ paf_constitution_description      <chr> "na", "na", "na", "na", "na", "na", …
$ entity_status_description         <chr> "Terminated", "Terminated", "Termina…
$ registration_incorporation_date   <date> 1974-10-09, 1974-10-12, 1974-09-20,…
$ uen_issue_date                    <date> 1974-10-09, 1974-10-12, 1974-09-20,…
$ address_type                      <chr> "LOCAL", "LOCAL", "LOCAL", "LOCAL", …
$ block                             <chr> "51", "93", "178", "21", "30", "38A"…
$ street_name                       <chr> "EAST COAST ROAD", "MARKET STREET", …
$ level_no                          <chr> "na", "10", "na", "na", "na", "na", …
$ unit_no                           <chr> "na", "01", "na", "na", "na", "na", …
$ building_name                     <chr> "na", "na", "na", "na", "na", "na", …
$ postal_code                       <chr> "428770", "0104", "0718", "0923", "4…
$ other_address_line1               <chr> "na", "na", "na", "na", "na", "na", …
$ other_address_line2               <chr> "na", "na", "na", "na", "na", "na", …
$ account_due_date                  <chr> "na", "na", "na", "na", "na", "na", …
$ annual_return_date                <chr> "na", "na", "na", "na", "na", "na", …
$ primary_ssic_code                 <dbl> 47722, 46301, 68104, 56111, 47102, 6…
$ primary_ssic_description          <chr> "na", "na", "na", "na", "na", "na", …
$ primary_user_described_activity   <chr> "na", "na", "na", "na", "na", "na", …
$ secondary_ssic_code               <chr> "na", "32909", "na", "56122", "na", …
$ secondary_ssic_description        <chr> "na", "na", "na", "na", "na", "na", …
$ secondary_user_described_activity <chr> "na", "na", "na", "na", "na", "na", …
$ no_of_officers                    <dbl> 7, 6, 3, 1, 5, 2, 2, 1, 4, 2, 2, 1, …
$ former_entity_name1               <chr> "na", "na", "na", "na", "na", "na", …
$ former_entity_name2               <chr> "na", "na", "na", "na", "na", "na", …
$ former_entity_name3               <chr> "na", "na", "na", "na", "na", "na", …
$ former_entity_name4               <chr> "na", "na", "na", "na", "na", "na", …
$ former_entity_name5               <chr> "na", "na", "na", "na", "na", "na", …
$ former_entity_name6               <chr> "na", "na", "na", "na", "na", "na", …
$ former_entity_name7               <chr> "na", "na", "na", "na", "na", "na", …
$ former_entity_name8               <chr> "na", "na", "na", "na", "na", "na", …
$ former_entity_name9               <chr> "na", "na", "na", "na", "na", "na", …
$ former_entity_name10              <chr> "na", "na", "na", "na", "na", "na", …
$ former_entity_name11              <chr> "na", "na", "na", "na", "na", "na", …
$ former_entity_name12              <chr> "na", "na", "na", "na", "na", "na", …
$ former_entity_name13              <chr> "na", "na", "na", "na", "na", "na", …
$ former_entity_name14              <chr> "na", "na", "na", "na", "na", "na", …
$ former_entity_name15              <chr> "na", "na", "na", "na", "na", "na", …
$ uen_of_audit_firm1                <chr> "na", "na", "na", "na", "na", "na", …
$ name_of_audit_firm1               <chr> "na", "na", "na", "na", "na", "na", …
$ uen_of_audit_firm2                <chr> "na", "na", "na", "na", "na", "na", …
$ name_of_audit_firm2               <chr> "na", "na", "na", "na", "na", "na", …
$ uen_of_audit_firm3                <chr> "na", "na", "na", "na", "na", "na", …
$ name_of_audit_firm3               <chr> "na", "na", "na", "na", "na", "na", …
$ uen_of_audit_firm4                <chr> "na", "na", "na", "na", "na", "na", …
$ name_of_audit_firm4               <chr> "na", "na", "na", "na", "na", "na", …
$ uen_of_audit_firm5                <chr> "na", "na", "na", "na", "na", "na", …
$ name_of_audit_firm5               <chr> "na", "na", "na", "na", "na", "na", …

The dataset consists of 2,026,935 rows and 53 columns.

write_rds(acra_data,
          "data/rds/acra_data.rds")

2.3.2 Tidying ACRA data

Standardisation of Key Fields
Column names were renamed to a common schema (date, ssic_raw, postal_code). Dates were converted into proper R Date objects. SSIC codes were cleaned to numeric strings, and 2-digit (ssic_2d) and 3-digit (ssic_3d) prefixes were extracted. Postal codes were also padded to six digits for consistency.

Temporal Filtering
Only businesses registered between 01-Jan-2024 and 30-Jun-2025 were retained, aligning with the study’s spatio-temporal scope. Records with missing or invalid dates were excluded.

Business Type Selection
The dataset was filtered to include only the Food and Beverage / Hospitality categories: Division 56 – Food & Beverage Service Activities, Group 107 – Manufacture of Other Food Products, Group 110 – Manufacture of Beverages

date_min <- as_date("2024-01-01")
date_max <- as_date("2025-06-30")
acra_all_fnb <- acra_data %>%
  # keep the first 24 cols 
  select(1:24) %>%
  # commonize column names 
  rename(
    date        = registration_incorporation_date,
    ssic_raw    = primary_ssic_code,
    postal_code = postal_code
  ) %>%
  mutate(
    # dates
    date = as_date(date),
    # SSIC as character 
    ssic_raw = as.character(ssic_raw),
    ssic_raw = str_replace_all(ssic_raw, "[^0-9]", ""),   
    ssic_2d  = str_sub(ssic_raw, 1, 2),
    ssic_3d  = str_sub(ssic_raw, 1, 3),
    # clean postal code to 6-digit with zeros
    postal_code = str_pad(postal_code, width = 6, side = "left", pad = "0")
  ) %>%
  
  filter(!is.na(date), date >= date_min, date <= date_max) %>%
  # target Division 56 OR Group 107 OR Group 110
  filter(ssic_2d == "56" | ssic_3d %in% c("107", "110")) %>%
  # time fields
  mutate(
    YEAR       = year(date),
    MONTH_NUM  = month(date),
    MONTH_ABBR = month(date, label = TRUE, abbr = TRUE),
    fnb_bucket = case_when(
      ssic_2d == "56"          ~ "F&B Services (Div 56)",
      ssic_3d == "107"         ~ "Food Manufacturing (Grp 107)",
      ssic_3d == "110"         ~ "Beverage Manufacturing (Grp 110)",
      TRUE                     ~ "Other"
    )
  )

Sanity Checks & Subsets
Basic counts by fnb_bucket were generated to confirm category distribution. Monthly registration summaries were produced to validate temporal coverage. Separate data frames (biz_56, biz_107, biz_110) were created for more detailed subgroup analysis.

# Counts by bucket
acra_all_fnb %>% count(fnb_bucket, sort = TRUE)
# A tibble: 3 × 2
  fnb_bucket                           n
  <chr>                            <int>
1 F&B Services (Div 56)             5758
2 Food Manufacturing (Grp 107)       737
3 Beverage Manufacturing (Grp 110)    30
# Monthly registrations for later plotting
acra_all_fnb %>%
  count(YYYYMM = floor_date(date, "month"), fnb_bucket) %>%
  arrange(YYYYMM, fnb_bucket)
# A tibble: 52 × 3
   YYYYMM     fnb_bucket                           n
   <date>     <chr>                            <int>
 1 2024-01-01 Beverage Manufacturing (Grp 110)     2
 2 2024-01-01 F&B Services (Div 56)              337
 3 2024-01-01 Food Manufacturing (Grp 107)        51
 4 2024-02-01 Beverage Manufacturing (Grp 110)     1
 5 2024-02-01 F&B Services (Div 56)              236
 6 2024-02-01 Food Manufacturing (Grp 107)        39
 7 2024-03-01 Beverage Manufacturing (Grp 110)     1
 8 2024-03-01 F&B Services (Div 56)              338
 9 2024-03-01 Food Manufacturing (Grp 107)        43
10 2024-04-01 Beverage Manufacturing (Grp 110)     1
# ℹ 42 more rows
# separate data frames for each bucket
biz_56  <- acra_all_fnb %>% filter(ssic_2d == "56")
biz_107 <- acra_all_fnb %>% filter(ssic_3d == "107")
biz_110 <- acra_all_fnb %>% filter(ssic_3d == "110")
# Date column named `date` and a month column for plotting later:
acra_all_fnb <- acra_all_fnb %>%
  mutate(
    date  = as.Date(date),
    month = floor_date(date, "month"),
    postal_code = stringr::str_pad(postal_code, 6, pad = "0")
  )

biz_56  <- biz_56  %>% mutate(postal_code = stringr::str_pad(postal_code, 6, pad = "0"))
biz_107 <- biz_107 %>% mutate(postal_code = stringr::str_pad(postal_code, 6, pad = "0"))
biz_110 <- biz_110 %>% mutate(postal_code = stringr::str_pad(postal_code, 6, pad = "0"))

2.3.3 Geocoding

Extract Unique Postcodes
Unique six-digit postal codes were extracted from the ACRA dataset. To ensure consistency, codes were padded with leading zeros where necessary and invalid entries (missing or non-numeric) were excluded.

Query the OneMap API
Each postcode was sent to the OneMap Singapore API using the code chunk below. The API returned address details and coordinates (X, Y in SVY21 projection; LATITUDE, LONGITUDE in WGS84). Postcodes with no matches were logged separately for later review.

postcodes <- unique(acra_all_fnb$postal_code)

url <- "https://onemap.gov.sg/api/common/elastic/search"

found <- data.frame()
not_found <- data.frame(postcode = character())

for (pc in postcodes) {
  query <- list(
    searchVal = pc,
    returnGeom = "Y",
    getAddrDetails = "Y",
    pageNum = "1"
  )
  
  res <- GET(url, query = query)
  json <- content(res)
  
  if(json$found != 0) {
    df <- as.data.frame(json$results, stringsAsFactors = FALSE)
    df$input_postcode <- pc
    found <- bind_rows(found, df)
  } else {
    not_found <- bind_rows(not_found, data.frame(postcode = pc))
  }
}

Append Geocoding Results
Geocoding was performed by extracting unique six-digit postal codes from the ACRA dataset and submitting them to the OneMap Singapore API. The API returned both address details and geographic coordinates, which were then joined back to the business records. Invalid or missing coordinates were cleaned, and duplicates were removed to ensure data quality. The final datasets were converted into spatial features (sf objects) in the SVY21 projection (EPSG:3414), enabling integration with planning boundaries and spatial analysis. To ensure reproducibility and avoid repeated API calls, the geocoded datasets and unresolved cases were saved as .rds files.

found <- found %>%
  #X/Y from OneMap are SVY21 (EPSG:3414); coerce to numeric
  mutate(
    X = suppressWarnings(as.numeric(X)),
    Y = suppressWarnings(as.numeric(Y))
  ) %>%
  # keep the essential columns
  select(1:10)
acra_all_fnb <- acra_all_fnb %>% left_join(found, by = c("postal_code" = "POSTAL"))
biz_56       <- biz_56       %>% left_join(found, by = c("postal_code" = "POSTAL"))
biz_107      <- biz_107      %>% left_join(found, by = c("postal_code" = "POSTAL"))
biz_110      <- biz_110      %>% left_join(found, by = c("postal_code" = "POSTAL"))
df_to_sf <- function(df, id_col = NULL) {
  df2 <- df %>%
    mutate(across(any_of(c("X","Y","LONGITUDE","LATITUDE")),
                  ~ suppressWarnings(as.numeric(.x))))

  if (all(c("X","Y") %in% names(df2))) {
    df2 <- df2 %>% filter(!is.na(X), !is.na(Y))
    if (!is.null(id_col) && id_col %in% names(df2)) {
      df2 <- df2 %>% group_by(.data[[id_col]]) %>% slice(1) %>% ungroup()
    }
    st_as_sf(df2, coords = c("X","Y"), crs = 3414)   # SVY21
  } else if (all(c("LONGITUDE","LATITUDE") %in% names(df2))) {
    df2 <- df2 %>% filter(!is.na(LONGITUDE), !is.na(LATITUDE))
    if (!is.null(id_col) && id_col %in% names(df2)) {
      df2 <- df2 %>% group_by(.data[[id_col]]) %>% slice(1) %>% ungroup()
    }
    st_as_sf(df2, coords = c("LONGITUDE","LATITUDE"), crs = 4326)  
  } else {
    stop("No coordinate columns found. Expected X/Y or LONGITUDE/LATITUDE. Present: ",
         paste(names(df2), collapse = ", "))
  }
}

acra_all_fnb_sf <- df_to_sf(acra_all_fnb, id_col = "uen")
biz_56_sf       <- df_to_sf(biz_56,       id_col = "uen")
biz_107_sf      <- df_to_sf(biz_107,      id_col = "uen")
biz_110_sf      <- df_to_sf(biz_110,      id_col = "uen")
write_rds(acra_all_fnb, "data/rds/acra_all_fnb.rds")
write_rds(biz_56,       "data/rds/biz_56.rds")
write_rds(biz_107,      "data/rds/biz_107.rds")
write_rds(biz_110,      "data/rds/biz_110.rds")
write_rds(not_found,    "data/rds/postcodes_not_found.rds") 
biz_missing <- acra_all_fnb %>%
  mutate(X = suppressWarnings(as.numeric(X)), Y = suppressWarnings(as.numeric(Y))) %>%
  filter(is.na(X) | is.na(Y))
write_rds(biz_missing, "data/rds/biz_missing_coords.rds")

Visualising the distribution

# Load URA Subzone boundaries
subzones <- st_read("data/geospatial/MasterPlan2019SubzoneBoundaryNoSeaGEOJSON.geojson") %>%
  st_transform(crs = 3414)
Reading layer `MasterPlan2019SubzoneBoundaryNoSeaGEOJSON' from data source 
  `C:\sonphamsmu\isss626-aug2526\Take-home_Ex\Take-home_Ex1\data\geospatial\MasterPlan2019SubzoneBoundaryNoSeaGEOJSON.geojson' 
  using driver `GeoJSON'
Simple feature collection with 332 features and 2 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS:  WGS 84
# Spatial join: assign each business to a subzone
acra_all_fnb_sf <- st_join(acra_all_fnb_sf, subzones)
tmap_mode("view")  # interactive map
tm_shape(subzones) + tm_borders() +
tm_shape(acra_all_fnb_sf) + tm_dots(col = "fnb_bucket", size = 0.05)
tmap_mode("plot")

code chunk to reveal the distribution of newly registered businesses by month.

monthly_counts <- acra_all_fnb %>%
  mutate(date = as.Date(date), month = floor_date(date, "month")) %>%
  count(month)

ggplot(monthly_counts, aes(x = month, y = n)) +
  geom_col(fill = "steelblue") +
  scale_x_date(date_labels = "%b-%Y", date_breaks = "1 month") +
  labs(x = "Month of Registration", y = "Number of Businesses",
       title = "Monthly New F&B/Hospitality Registrations (Jan 2024 – Jun 2025)") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

References